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Abstract 

Rewiring and optimization of metabolic networks to enable the production of commercially valuable chemicals is a 
central goal of metabolic engineering. This prospect is challenged by the complexity of metabolic networks, lack of 
complete knowledge of gene function(s), and the vast combinatorial genotype space that is available for 
exploration and optimization. Various approaches have thus been developed to aid in the efficient identification of 
genes that contribute to a variety of different phenotypes, allowing more rapid design and engineering of traits 
desired for industrial applications. This review will highlight recent technologies that have enhanced capabilities to 
map genotype-phenotype relationships on a genome wide scale and emphasize how such approaches enable 
more efficient design and engineering of complex phenotypes. 
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Introduction 

Optimizing microbial metabolism for the production of 
commercially valuable chemicals such as biofuels, che- 
micals and therapeutics is a central aim of metabolic en- 
gineering. This aim is typically approached by altering 
native metabolic networks to promote flux through de- 
sired metabolic pathways while minimizing the buildup 
of potentially toxic intermediates and the formation of 
undesired byproducts. Towards this end, a variety of 
rational engineering approaches have been successfully 
applied. For example, the introduction of non-native 
pathways to promote product formation [1], the over- 
expression of native biosynthetic enzymes [2], the re- 
moval of regulatory repression [3], and modifications 
made to increase precursor metabolite supply [4] have 
all proven effective for improving product yields. 
Such rational modifications however require significant 
a priori knowledge of the pathways in question [5,6]. In 
many cases this knowledge is incomplete, particularly 
for complex phenotypes that require an intricate balance 
between the activities of many seemingly unrelated gene 
products. 
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In contrast to rational engineering approaches, "inverse" 
metabolic engineering approaches employ directed evolu- 
tion to rapidly explore large adaptive landscapes in search 
of beneficial mutations [7,8]. Traditional genome engi- 
neering methods such as chemical mutagenesis or genome 
shuffling [9] however generate mutations in a random and 
combinatorial fashion and require extensive sequencing 
and characterization to assess genotype-phenotype corre- 
lations and distinguish between adaptive mutations and 
neutral or maladaptive hitch-hiking mutations [10,11]. 
The immense combinatorial sequence space of even a mo- 
dest genome size (-4^'^^^'^^°) for the Escherichia coli 
genome) requires more rational search strategies that can 
identify genes or gene networks that promote the desired 
phenotype in laboratory timescales (Figure 1). Such infor- 
mation, can then be leveraged to guide an exploitative 
combinatorial optimization of the most relevant genes 
[12], akin to the use of structural and evolutionary infor- 
mation to guide site saturation mutagenesis for protein 
engineering [13]. The development of platform technolo- 
gies that enable genome wide mapping of genes to traits 
has thus been a central challenge for the development of 
more efficient strain engineering. 

A variety of techniques have been recently developed 
to address these challenges and enable targeted ap- 
proaches to genome wide modification and tracking 
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Figure 1 Overview of inverse engineering approaches. Inverse engineering begins witli a wild-type cell line in which a large number of 
mutations are introduced in either a random or directed manner to generate phenotypic diversity. These mutants are subjected to screening or 
selection for a trait of interest (i.e. solvent/temperature tolerance or metabolite production). The pre and post selection populations are 
sequenced or genotyped by microarray hybridization. The fitness (H//) of the each genotype fe) can be computed by the equation to obtain a 
plot of fitness (far right circle) where each bar represents a specific genetic loci. Alleles identified by this can be used to guide more focused 
combinatorial optimization on the trait of interest using directed techniques such as MAGE [16]. While this provides a promising algorithm for 
directed evolution on the genome scale future technologies will need to be developed to enable this engineering cycle to be preformed rapidly 
and recursively (indicated by the red arrows) by tracking gene combinations in a high-throughput fashion. 



of genotype fitness that are ultimately aimed at speeding 
up the genome engineering cycle (Figure 1) [7,8,14]. These 
multiplex "forward" genomics approaches are founded on 
fundamental technological advances in both DNA sequen- 
cing and microarray based DNA detection methods that 
allow quantitative tracking of the concentration of differ- 
ent genotypes in a large population [8,12,14]. Additionally, 
recent advances in multiplexed DNA synthesis [15] and 
the ability to rationally modify bacterial and eukaryotic 
chromosomes using homologous recombination have en- 
abled the production of genome scale libraries with cha- 
racteristics that are more suitable for the short read 
sequencing and array based detection [16-21]. This review 
will focus on approaches to genome wide mapping of 
genotype-phenotype relationships and discuss how such 
methods are being applied to advance a basic unders- 
tanding of and ability to design and engineer complex 
phenotypes. 

Genomic vector library enrichment strategies 

One of the most well established methods for mapping 
genes to fitness at the whole genome scale involves the 
creation of extra-chromosomal libraries of fragmented 



genomic DNA. In these approaches purified genomic 
DNA is digested and cloned into a plasmid backbone 
and transformed into a suitable host strain (Figure 2a). 
Following application of selective pressure or a high- 
throughput screen, vectors containing the enriched gen- 
omic fragments are isolated and subsequently identified 
by hybridization to whole genome microarrays or by 
next-generation sequencing [7,22-24] (Figure 2a). This 
strategy was first demonstrated using high-density whole 
genome microarrays to identify protein- protein in- 
teractions implicated in mRNA splicing using a yeast 
two-hybrid screen [22]. Interactions were identified by 
co-expression of a DNA binding fusion protein of inter- 
est with an genomic activator fusion library and allowed 
high-throughput array based detection of interacting li- 
brary variants from pooled clones [22]. Subsequent stud- 
ies using genomic libraries in E, coli demonstrated that 
this approach could be implemented to identify genes 
that confer tolerance to the antimicrobial agents in 
Pinesol [23]. 

Genomic vector libraries approaches have recently been 
used to enable more sophisticated genotype-phenotype 
mapping. For example, MultiSCalar Analysis of Library 
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Figure 2 Genome wide fitness measurement techniques, a) Vector library based approaches such as SCALEs [25] start with cloning of 
random DNA fragments that represent different sites of the genome of interest. This diverse vector library is transformed into the host strain and 
grown under a defined selective pressure. Following selection the cells are lysed and the vector libraries cut and hybridized to a microarray. The 
signal intensities from the microarray are then mapped back to the genomic coordinates to identify traits of interest and measure fitness as 
described in Figure 1. b) Overview of TRMR. An array of rationally designed oligos are cleaved and processed in parallel to produce cassettes with 
homology arms that guide rational engineering of the entire genome. The TRMR barcodes can then be hybridized to a microarray or PGR 
amplified and sequenced to determine the fitness of each mutation. This technique provides a versatile method for engineering different 
functional changes into the genome of interest and rapidly tracking the effects. 



Enrichments (SCALEs) allows simultaneous investigation 
of the fitness conferred by individual genes, multi-gene 
fragments, and small operons [25]. SCALEs has been 
successfully applied to the identification of genes that 
improve growth under antimetabolite stress [26], genes 
that restore redox balance in E.coli strains evolved for 
succinate production [27], genes that confer tolerance to a 
variety growth inhibiting compounds relevant to cellulosic 
biofuels [28-31], as well as to investigate the basic mecha- 
nisms at work in laboratory growth selections [32]. For 
example, it was demonstrated that single batch growth 
predominantly favors microbes with increased growth rate 
while serial batch culturing provides selective pressure for 
both increased growth rate and decreased lag time. These 
results agreed well with a simple mathematical model 
of bacterial growth, suggesting a growing importance 
for mathematical modeling as multiplex fitness mapping 
technologies continue to develop [32] . 



One potential limitation of many vector based libraries 
is the inability to identify phenotypes that arise from 
synergistic interactions between distantly spaced loci. 
A recently described technique known as Coexpressing 
Genomic Libraries (CoGEL) was developed to overcome 
this limitation by constructing genomic libraries in mul- 
tiple vectors with different replication origins and resist- 
ance makers that can be co- expressed in individual cells 
[33]. This approach was demonstrated to successfully 
rescue the auxotrophy of a designed mutant strain in 
which two mutations were introduced into the lysine 
biosynthetic pathway at distal chromosomal loci. To 
demonstrate the utility of this approach for the study of 
more complex phenotypes, CoGel variants were isolated 
from E. coll exposed to acid stress [33]. In addition to 
recovering genes involved in known proton exchange 
pathways the study identified unanticipated roles for a 
small RNA {arcZ) and recA in acid tolerance [33]. 
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A major challenge for these approaches, as well as se- 
quencing in general, involves the parallel sequencing of 
multiple sites across individual genomes within a com- 
plex cellular mixture. That is, while it is now possible to 
sequence multiple sites across a population in parallel, 
methods for determining which mutations came from 
the same individual cell remains difficult. 

Identification of novel functional activities from 
metagenomic vector libraries 

Natural microbial communities provide a rich source of 
diversity that can be prospected for novel metabolic ac- 
tivities and used to expand the capabilities of genetically 
tractable organisms such as E. coli. This approach is 
founded on techniques that were first developed to 
enable extraction and direct cloning of environmental 
DNA to perform 16s based rRNA profiling for microbial 
ecology studies [34]. These techniques have since been 
expanded upon for the purpose of identifying novel 
catalysts that perform industrially relevant reactions 
[35,36]. For example, shotgun based cloning of meta- 
genomic DNA and activity screening approaches have 
been used to identify novel amylases [37] and cellulases 
with enhanced stability and activity compared to the 
enzymes represented in cultured organisms [38]. These 
enzymes represent important industrial biocatalysts as 
their activities are critical to the utilization of cellulose for 
the production of next-generation biofuels. 

The complexity of metagenomic samples can limit the 
efficiency with which novel activities can be successfully 
identified, thus necessitating strategies to enrich the 
functional traits of interest prior to cloning. For ex- 
ample, exposing natural microbial communities to a 
selective pressure of industrial interest (i.e. thermotole- 
rance, carbon substrate utilization, etc.) has proven 
effective for enriching the metabolic functions that have 
evolved under similar conditions in nature. This ap- 
proach was elegantly demonstrated in a study that iden- 
tified genes associated with carbon uptake in sediments 
taken from Lake Washington [39]. Cultures were grown 
in the presence of single carbon C^^-labeled substrates 
to enrich the organisms responsible for carbon uptake 
and their genomic DNA isolated by density gradient cen- 
trifugation [39]. Target gene enrichment has also been 
achieved using degenerate PGR primers to amplify the 
gene families or pathways of interest using phylogenetic- 
ally conserved priming sites [40] . Homologous recombina- 
tion based cloning using the RecET system in Escherichia 
coli has also recently proven useful for bioprospecting 
large heterologous gene clusters that perform coordinated 
biochemical functions [41]. This technique has enabled 
direct cloning and characterization of polyketide synthase 
gene clusters ranging from -10-50 kb from a number of 
organisms [41]. These gene clusters are of keen interest as 



they are responsible for the production of many clinically 
important secondary metabolites [42,43] . 

Transposon saturation mutagenesis 

Disruption or modification of genes within the genome 
provides another useful method for rapid and parallel 
dissection of gene functions. Transposon saturation mu- 
tagenesis has been widely employed due to the efficacy 
of transfection and the wide variety of bacterial species 
for which this technology exists [44-49]. Transposon li- 
braries are constructed using modified transposons that 
enable downstream identification of the genomic inser- 
tion sites by microarray or sequencing. For example, one 
recently developed strategy name Tn-Seq takes advan- 
tage of the type-IIS Mmel enzyme to cleave 20 nucleo- 
tides outside of its recognition site to identify adjacent 
genomic sequences [45]. Transposon mutagenesis ap- 
proaches have been employed to study many aspects of 
adaptive bacterial physiology including gene fitness un- 
der different media conditions [44], bacterial patho- 
genesis [46,50], biofilm formation [51], and motility [47]. 
The high efficiency with which these libraries can be 
constructed has also enabled their use in examining 
pairwise interactions through combinatorial gene knock- 
out strategies [45]. 

In addition to studying the genetics of naturally oc- 
curring phenotypes, transposon libraries have also been 
utilized to study traits evolved by directed evolution for 
the purposes of industrial applications. For example, a 
recent study combined transposon and plasmid over- 
expression libraries to investigate the molecular mecha- 
nisms of ethanol tolerance in E, coli [52]. The study 
identified key modules involved in maintaining mem- 
brane and cell wall integrity, as well as transcription 
factors that regulate osmolyte production and ethanol 
degradation [52]. Similarly, Alper et, al have utilized 
transposon library approaches to identify genes that 
increase lycopene production titers in E. coli [53,54]. 
These studies collectively identified a number of genes 
that boost lycopene production but were not identified by 
stoichiometric modeling based approaches, thus demons- 
trating the complementarity of inverse and rational meta- 
bolic engineering approaches. 

Rational genomic libraries using homologous 
recombination 

Despite the widespread use of transposon- or plasmid- 
based libraries, the data generated can be difficult to in- 
terpret, low-resolution, and non- or semi-quantitative. 
Moreover the types of mutations introduced are random 
and typically limited to increased dosage or insertions/ 
disruptions. As an example, many insertions can result 
in partial disruption of transcription or translation of 
the gene target, making it difficult to account for the 
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functional influence of the mutation. As a second exam- 
ple, more sophisticated genome-engineering approaches 
require capabilities for modif)^ing specific regions of the 
genome at a resolution as fine as the single nucleotide 
level. Homologous recombination is well suited to this 
task as it allows precise genomic manipulations across the 
entire genome [55-57]. The utility of homologous recom- 
bination for directed genome engineering was first de- 
monstrated with targeted single gene replacements in 
Saccharomyces cerevisiae [58]. Subsequently it was dem- 
onstrated that this strategy could be expanded to enable 
rapid parallel identification of gene knockouts by inclusion 
of a 20 base pair DNA tag in the replacement cassette that 
can serve as a barcode for microarray based quantification 
of each allele knockout [19]. Homologous recombination 
has since been applied at a genome wide scale to enable 
parallel assessment of allele fitness during growth in a 
variety of conditions [20] and more recently to study 
filamentous growth in yeast as this phenotype is char- 
acteristic of opportunistic yeast pathogens [59]. Similarly 
in E, coli it has been demonstrated that endogenous recET 
or X-red bacteriophage genes can mediate highly efficient 
recombination in bacteria [17,55,56]. This has enabled 
the construction of similar genome wide libraries in 
E, coli that serve as a valuable tool for the research 
community [57]. 

More recently, recombineering has been demonstrated 
to provide an important tool for identification of genes 
that confer increased fitness under conditions that are 
common to industrial settings. For example, Warner 
et al reported the trackable multiplex recombineering 
(TRMR) [21] technique, which combines the barcoding 
strategy developed in yeast [19] with the power of ho- 
mologous recombination to efficiently engineer genome 
wide libraries in a single transformation (Figure 2b). The 
authors demonstrated that such a library can be accom- 
plished on a ~1 week timescale using state of the art 
DNA microarray fabrication technologies that enable 
parallel synthesis of a large number of defined oligo- 
nucleotides [15,60-62]. TRMR has been employed to 
identify alleles that improve fitness in the presence of al- 
ternative carbon sources, antimetabolites and cellulosic 
hydrosylate as these represent commonly encountered 
stresses that bacteria are faced with in industrial ap- 
plications [21]. Interestingly TRMR established a previ- 
ously unidentified role for aphC in conferring tolerance 
to cellulosic hydrosylate. The ability to generate, select, 
and genotype TRMR libraries on a rapid time scale 
(~ 1-2 weeks) significantly enhances the throughput 
of target identification, and the design of similar gen- 
ome wide cassettes employing a broader range of mu- 
tations is envisioned. Additionally, employing TRMR 
recursively to generate combinatorial genome wide 
mutations could potentially enable in depth analysis 



of genetic interactions important for these and other 
phenotypes. 

Another application of recombineering has been the 
creation of combinatorial genetic libraries that are su- 
perior in many ways to those generated by random mu- 
tation techniques. For example. Multiplex Automated 
Genome Engineering (MAGE) takes advantage of the 
high efficiency (-3-30%) of ssDNA mediated allelic re- 
placement to introduce defined mutations at multiple 
chromosomal loci [16] (Figure 1). To demonstrate the 
power of MAGE, Wang et al. chose 24 genes that had 
been previously identified to enhance lycopene produc- 
tion and combinatorially optimized their expression 
using oligonucleotides that introduce degeneracy into 
the ribosome binding site (RBS) of these genes [16]. This 
approach generated ~ 10^ mutations/day and ultimately 
allowed for the identification of strains within 3 days 
that had ~ 5 fold increased lycopene production. Simi- 
larly, MAGE was utiUzed for combinatorial introduc- 
tion of T7 promoters upstream of genes involved in 
aromatic amino acid production in E, coli to rapidly 
survey how these expression changes influence path- 
way flux using a colorimetric assay for indigo produc- 
tion [63]. MAGE has been also been further extended 
to enable rapid heiarchical assembly of mutations across 
the entire genome via conjugation [64]. This approach 
known as conjugative assembly based genome enginee- 
ring (CAGE) provides a viable alternative to synthetic 
genome assembly techniques [65,66] for rapidly con- 
structing chromsomes with a large number of muta- 
tions (~10^/genome). 

The combination of TRMR and MAGE to globally 
search and locally optimize metabolic networks res- 
pectively offers an exciting algorithm for approaching 
metaboUc engineering of complex traits (Figure 1) [12]. 
Sandoval et al reported a first attempt to combine these 
methodologies for the purposes of optimizing growth 
under industrially relevant conditions [67]. The authors 
selected genes identified by the TRMR data as adaptive 
for growth in acetate, low pH or cellulosic hydrosylate as 
targets for recursive RBS engineering by MAGE. In- 
terestingly, many of the clones identified following re- 
cursive MAGE were identical to the single mutants 
identified by TRMR, suggesting the possibility of ne- 
gative epistasis between the targeted alleles [67]. It is 
important to note however, that this strategy required 
isolation of individual clones and sequencing of each 
targeted locus, thus limiting the ability to deeply characte- 
rize the individual fitness profiles within the combinatorial 
library. The development of techniques that enable more 
rapid characterization of the fitness of such combinatorial 
genotypes from a mixed population thus represents an 
important challenge to the future of rational combina- 
torial genome-engineering. 
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Expression profiling methods 

Microorganisms have evolved exquisite transcriptional 
regulatory networks that enable rapid changes in gene 
expression and allow them to adjust their metabolism 
for optimal performance under a wide variety of envi- 
ronmental conditions. The ability to track genome wide 
changes in mRNA expression can thus provide another 
useful metric for identifying functional elements that are 
involved in various adaptive processes. In addition, tran- 
scriptome level analyses can help define the basic orga- 
nization of operons and identify non-coding regulatory 
networks that shape cellular metabolism [68] (Figure 3). 
Unlike the genotyping methods described above, tran- 
scriptional profiling does not provide information on the 
relative fitness of differentially transcribed loci. It can 
however be utilized to survey genome wide changes in 
expression that are correlated with an observable pheno- 
typic change [69-71], or gain insight into the rewiring of 
metabolic networks in evolved lineages [72]. Traditional 
transcription profiling methods have relied on tiling 
array [7,73] however advancements in DNA sequencing 
have enabled the development of direct whole cell RNA 
sequencing, or "RNA-seq" that allow a much greater 
dynamic range of detection than was previously possible 
[74]. Additionally, RNA-seq methods offer potentially 
unbiased approaches to analyze transcript levels and the 
strand origin of coding and non-coding messenger RNAs 
by avoiding cDNA synthesis and amplification steps [74], 
thus enabling more detailed understanding of the trans- 
criptional landscape. 

Transcriptome wide analysis has provided a number of 
recent insights into the structure of native transcription 
networks. For example, integration of transcriptional 
profiling with chromatin immunoprecipitation (CHIP) 
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Figure 3 Expression profiling to identify adaptive gene 
expression responses. Changes in gene expression (red line vs. 
blue) are often correlated with adaptation to changes in the 
environment or genotypic alterations. Genome wide profiling of 
such expression changes can thus enable correlations or causal 
associations to be imputed between the affected genes and 
the trait of interest. Transcriptome analyses can also define 
transcriptional units (blue genes) that are co-expressed under 
varying conditions to allow annotations of unknown gene functions 
or associations between co-expressed genes. 



RNA polymerase II to identify global occupancy of the 
transcription machinery enabled the identification of a 
number of 4661 transcription units in E. coll as well as 
defining dynamic expression patterns for many operons 
[68]. In the case of the thrLABC operon for example, 
transcription was found to initiate from the most up- 
stream promoter during log phase, but during lag phase 
an alternative promoter is activated to enable expression 
of the distal genes thrB and thrC and alleviate attenu- 
ation caused by the 5' untranslated region in this poly- 
cistronic mRNA [68]. Such approaches have also yielded 
a wealth of information about the global rewiring of 
these networks during adaptation to different growth 
conditions. For example transcriptome wide analyses of 
Bacillus subtilis under 104 different growth conditions 
allowed for comprehensive identification of regulons 
associated with the various transcription factors in this 
industrially relevant organism [70]. 

Application of comparative transcriptome profiling 
between parental and evolved or engineered cell lines 
can also provide useful insight into the genes whose al- 
tered expression contributes to production and tolerance 
related phenotypes. For example, one study of ethanol 
tolerance in E, coli compared the transcription profile of 
a parental strain to multiple laboratory evolved strains 
and subsequently confirmed causal link between up- 
regulation of iron transport and amino acid biosynthesis 
genes [75]. Supplementation of these metabolites into a 
5% ethanol containing media enhanced specific growth 
rates of the wild-type strain supporting this hypothesis. 
Another study looked at the difference between the tran- 
scriptome of a strain of a £. coli that was rationally engi- 
neered for valine production compared to the parental 
W3110-strain derivative [72]. The transcriptome of the 
valine producing strain exhibited increased levels of ex- 
pression for the valine biosynthesis genes as intended 
and decreased expression of the tricarboxylic acid cycle 
enzymes, whereas carbon flux through glycolysis and the 
pentose phosphate pathway were unchanged [72]. 

In addition to monitoring changes in expression due 
to environmental perturbations transcriptional profiling 
has been recently employed to better understand tran- 
scriptional networks that have been rewired by synthetic 
means. For example, global transcription machinery 
engineering (gTME) involves the creation of plasmid en- 
coded libraries of major sigma factors that have altered 
DNA binding preferences [76]. Following application of 
a selective pressure the mutant sigma-factors and the 
corresponding transcriptional changes can be charac- 
terized to identify processes that enhance survival or 
growth. Although gTME has been found to produce a 
large number of differentially expressed genes in evolved 
lineages [76,77], transcriptional profiling experiments 
have aided in the identification of genes that more 
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than double the production of the essential amino 
acid L-tyrosine [78]. 

Whole genome sequencing (WGS) 

Despite the power to map genotype fitness genome wide, 
the techniques described above lack the ability to com- 
pletely define the genotype of an engineered or evolved 
strain, leaving open the possibility that off target mu- 
tations may influence the observed trait. While whole 
genome sequencing (WGS) can address this challenge, it 
has historically been costly and labor intensive. However, 
advances in next generation sequencing have occurred at 
an astounding rate. Current technologies enable ~ 10^^ kbp 
per sequencing run [79], and the increased length of real 
time sequencing technologies [80-82] promise to boost this 
output further. 

Long term laboratory evolution studies now utilize 
WGS to map mutations onto growth phenotypes. WGS 
has therefore enabled exciting new insights into the mo- 
lecular mechanisms of adaptation to a variety of stresses. 
For example, a recent study sequenced the genomes of 
29 E. coli clones taken from different time points during 
the long term evolution of the population for increased 
citrate utilization [83]. The sequences revealed a gene 
duplication event that placed a non- transcribed citrate 
transporter gene citT in front of an actively transcribed 
promoter to enable gene expression. Another study re- 
cently performed 115 parallel selections for thermotole- 
rance (growth at 42.2°C) followed by WGS of a single 
isolate from each population [84]. The study found that 
although adaptive mutations often do not precisely 
overlap at the nucleotide level, convergence can readily 
be detected at the single gene and operon level with 
relatively few isolates. The study also found epistatic 
"blocks" that seemed to provide different potential routes 
for further adaptation [84]. Notably, mutations identified 
using WGS included not only point mutations, but a 
number of gene duplications, insertions, deletions, and 
rearrangements all in combinatorial sets that would be dif- 
ficult to identify by current tracking technologies. 

On the other hand, although WGS provides a com- 
plete description of the genome, it is limited to relatively 
small sample sizes making it impossible to determine 
the relative fitness of different mutational combina- 
tions. To determine the fitness of combinatorial mu- 
tations requires deep sampling of the combinatorial 
sequence space of a directed set of mutations. As 
tracking technologies are expanded from single allele 
approaches to determining the fitness of many gene- 
tic combinations at high depth they will open the 
door to the evaluation of increasingly complex geno- 
type-phenotype relationships. WGS should therefore 
be considered as a complementary approach to cur- 
rent and future tracking technologies that will aid in 



improving the accuracy with which relevant muta- 
tions are identified. 

Conclusions 

To the benefit of basic and applied research alike, ge- 
nome wide tracking technologies have significantly en- 
hanced the throughput with which genotype- phenotype 
relationships can be investigated. Approaches such as 
TRMR [21] and SCALEs [25] for example can produce 
high resolution fitness maps covering the entire E, coli 
genome and have aided in uncovering genes implicated 
in a variety of industrially relevant traits with rapid turn- 
over times. Recent efforts have been made to combine 
TRMR with combinatorial engineering strategies similar 
to MAGE [16] promise to significantly increase through- 
put of strain engineering programs (Figure 1) [29]. This 
study however highlighted the importance of epistatic 
interactions as many of the colonies isolated after com- 
binatorial engineering contained only singly mutated 
ribosome binding sites as well as the need for technolo- 
gies to more deeply analyze populations that have been 
engineered by multiplexed approaches. 

The techniques described here will need to be im- 
proved such that multigenic traits can be characterized 
in parallel and the engineering cycle can be performed 
recursively. Recent progress towards more effective 
combinatorial tracking has been made using synthetic 
RNA based regulatory devices that enable multiplexed, 
sequence-specific gene control from single plasmid [85]. 
Similar approaches could also be readily envisioned using 
the recently described CRISPR system in which nu- 
clease inactive cas9 was demonstrated to provide in- 
ducible transcription repression based solely on the 
sequence of a synthetic guide RNA [86]. Sequence 
specified combinatorial libraries that sample different 
genes at varying expression levels in these systems 
therefore offer an exciting opportunity to more quickly 
survey adaptive landscapes in search of more optimal 
engineering solutions. Combinatorial tracking approa- 
ches also offer the promise of new sources of epistasis 
in the complex genetic networks of living organisms. 
Techniques such as expression profiling and WGS 
will also continue to provide complementary tools 
that enhance our knowledge of complex phenotypes 
and, importantly, our ability to engineer new and use- 
ful traits. 
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